I have set echo=FALSE so that most of the code chunks will not display. Please refer to the .Rmd file for source code.

1 Loading packages and importing data

1.1 Packages

Details of packages, etc, are given below.

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.0.0 ggthemes_4.2.0  see_0.2.1       latex2exp_0.4.0
##  [5] ggridges_0.5.1  tidybayes_2.0.2 brms_2.12.0     Rcpp_1.0.3     
##  [9] forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3     purrr_0.3.3    
## [13] readr_1.3.1     tidyr_1.0.2     tibble_2.1.3    ggplot2_3.2.1  
## [17] tidyverse_1.2.1 bookdown_0.18  
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1          rsconnect_0.8.15         
##  [3] markdown_1.1              base64enc_0.1-3          
##  [5] rstudioapi_0.10           rstan_2.19.2             
##  [7] svUnit_0.7-12             DT_0.8                   
##  [9] fansi_0.4.1               lubridate_1.7.4          
## [11] xml2_1.2.2                bridgesampling_0.7-2     
## [13] knitr_1.25                shinythemes_1.1.2        
## [15] bayesplot_1.7.0           jsonlite_1.6             
## [17] broom_0.5.2               shiny_1.3.2              
## [19] compiler_3.6.1            httr_1.4.1               
## [21] backports_1.1.5           assertthat_0.2.1         
## [23] Matrix_1.2-17             lazyeval_0.2.2           
## [25] cli_2.0.1                 later_0.8.0              
## [27] htmltools_0.3.6           prettyunits_1.1.1        
## [29] tools_3.6.1               igraph_1.2.4.1           
## [31] coda_0.19-3               gtable_0.3.0             
## [33] glue_1.3.1                reshape2_1.4.3           
## [35] cellranger_1.1.0          vctrs_0.2.2              
## [37] nlme_3.1-140              crosstalk_1.0.0          
## [39] insight_0.5.0             xfun_0.8                 
## [41] ps_1.3.0                  rvest_0.3.4              
## [43] mime_0.7                  miniUI_0.1.1.1           
## [45] lifecycle_0.1.0           gtools_3.8.1             
## [47] zoo_1.8-6                 scales_1.1.0             
## [49] colourpicker_1.0          hms_0.5.0                
## [51] promises_1.0.1            Brobdingnag_1.2-6        
## [53] parallel_3.6.1            inline_0.3.15            
## [55] shinystan_2.5.0           yaml_2.2.0               
## [57] gridExtra_2.3             loo_2.2.0                
## [59] StanHeaders_2.19.0        stringi_1.4.5            
## [61] bayestestR_0.3.0          dygraphs_1.1.1.6         
## [63] pkgbuild_1.0.6            rlang_0.4.4              
## [65] pkgconfig_2.0.3           matrixStats_0.55.0       
## [67] evaluate_0.14             lattice_0.20-38          
## [69] rstantools_2.0.0          htmlwidgets_1.3          
## [71] tidyselect_0.2.5          processx_3.4.1           
## [73] plyr_1.8.5                magrittr_1.5             
## [75] R6_2.4.1                  generics_0.0.2           
## [77] pillar_1.4.3              haven_2.1.1              
## [79] withr_2.1.2               xts_0.11-2               
## [81] abind_1.4-5               modelr_0.1.5             
## [83] crayon_1.3.4              arrayhelpers_1.0-20160527
## [85] rmarkdown_2.1             grid_3.6.1               
## [87] readxl_1.3.1              callr_3.4.1              
## [89] threejs_0.3.1             digest_0.6.23            
## [91] xtable_1.8-4              httpuv_1.5.1             
## [93] stats4_3.6.1              munsell_0.5.0            
## [95] shinyjs_1.0

Use multiple cores and set number of iterations.

1.2 Import Data

Here we import our data and make some sumamry plots.

1.2.1 EEG Data

Importing the odd-harmonic EEG data from six electrodes region-of-interest over occipital cortex.

## Observations: 400
## Variables: 3
## $ wg      <chr> "P2", "PM", "PG", "CM", "PMM", "PMG", "PGG", "CMM", "P4"…
## $ subject <chr> "s01", "s01", "s01", "s01", "s01", "s01", "s01", "s01", …
## $ rms     <dbl> 0.4013, 0.6555, 0.5547, 0.7635, 0.9185, 0.7285, 0.4320, …

It’s clearly skewed, and negative display duration are impossible, so will fit a glm with family = 'lognormal.

Data are the root-mean-squaured (rms) over the odd-harmonic filtered waveforms.

Figure 1.1: Data are the root-mean-squaured (rms) over the odd-harmonic filtered waveforms.

1.2.2 Threshold Data

Here we import that data and select the columns that we’re interested in. Threshold gives the required display duration (in seconds) for the two stimuli to allow for accurate discrimination.

## Observations: 186
## Variables: 3
## $ subject   <chr> "person10", "person10", "person10", "person10", "perso…
## $ wg        <chr> "CM", "CMM", "P2", "P3", "P31M", "P3M1", "P4", "P4G", …
## $ threshold <dbl> 0.74125, 0.20216, 0.47697, 0.35012, 0.24529, 0.19022, …

As above, a summary of the data.

Histogram of the display duration thresholds.

Figure 1.2: Histogram of the display duration thresholds.

Again, we have a skewed distribution, so will fit with family = 'lognormal'.

1.2.3 Control Data

In addition to the primary EEG data set, we are also importing two control data sets which are (a) even harmonic data from the same occipital electrodes, and (b) odd harmonic data from six parietal electrodes (see main paper).

2 Bayesian Analysis

Here are the details of the Bayesian multi-level modelling. Our general approach is:

2.1 Define Priors

In this section we will specify some priors. We then then use a prior-predictive check to assess whether the prior is reasonable or not (i.e., on the same order of magnitude as our measurements).

2.1.1 Fixed Effects

Our independent variable is a categorical factor with 16 levels. We will drop the intercept from our model and instead fit a coefficent for each factor level (\(y \sim x - 0\)). As our dependant variable will be log-transformed, we can use the priors below:

prior <- c(
  set_prior("normal(0,2)", class = "b"),    
  set_prior("cauchy(0,2)", class = "sigma"))

2.1.2 Group-level Effects

We will keep the default weakly informative priors for the group-level (‘random’) effects. From the brms documentation:

[…] restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 10. This prior is used (a) to be only very weakly informative in order to influence results as few as possible, while (b) providing at least some regularization to considerably improve convergence and sampling efficiency.

2.1.3 Prior Predictive Check

Now we can specify our Bayesian multi-level model and priors. Note that as we are using sample_prior = 'only', the model will not learn anything from our data.

m_prior <- brm(data = d_eeg, 
  rms ~ wg-1 + (1|subject),
  family = "lognormal", 
  prior = prior, 
  iter = n_iter ,
  sample_prior = 'only')

We can use this model to generate data.

## Observations: 320,000
## Variables: 2
## $ key   <chr> "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2"…
## $ value <dbl> 0.58997505, 2.38399527, 0.16118726, 12.75673347, 23.216127…
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.1: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

We can see that i) our priors are relatively weak as the predictions span several orders of magnitide, and ii) our empirical data falls within this range.

2.2 Compute Posterior

2.2.1 Fit Model to EEG Data

We will now fit the model to the data.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: rms ~ wg - 1 + (1 | subject) 
##    Data: d_eeg (Number of observations: 400) 
## Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 20000
## 
## Group-Level Effects: 
## ~subject (Number of levels: 25) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.38      0.06     0.29     0.52 1.00     1233     2365
## 
## Population-Level Effects: 
##        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## wgCM      -0.49      0.09    -0.66    -0.31 1.00      695     1064
## wgCMM     -0.12      0.09    -0.29     0.06 1.00      689      982
## wgP2      -0.95      0.09    -1.12    -0.77 1.00      690      967
## wgP3      -0.82      0.09    -0.99    -0.64 1.00      701     1050
## wgP31M    -0.43      0.09    -0.60    -0.26 1.01      704     1054
## wgP3M1    -0.14      0.09    -0.31     0.04 1.00      694      969
## wgP4      -0.48      0.09    -0.65    -0.30 1.01      678      973
## wgP4G     -0.26      0.09    -0.42    -0.08 1.00      689      887
## wgP4M      0.15      0.09    -0.01     0.33 1.00      699     1209
## wgP6      -0.48      0.09    -0.65    -0.30 1.00      696     1020
## wgP6M     -0.05      0.09    -0.21     0.13 1.00      687     1003
## wgPG      -0.93      0.09    -1.10    -0.75 1.00      704     1064
## wgPGG     -0.72      0.09    -0.89    -0.55 1.01      698     1106
## wgPM      -0.59      0.09    -0.76    -0.41 1.00      703      982
## wgPMG     -0.26      0.09    -0.42    -0.08 1.00      691     1001
## wgPMM     -0.07      0.09    -0.24     0.11 1.00      689     1043
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.23      0.01     0.21     0.25 1.00     8411    11033
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We will now have at what the model predicts for the average participant (i.e, ignoring the random intercepts). If anybody else would like to suggest over sensible ways to summarise this model, do let me know! I’ve been trying to use some of the functions from the tidybayes package, and they don’t (yet?) have much on random effects.

The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.2: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

2.2.2 Fit Model to Display Duration Threshold Data

We will now fit the model to the data.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: threshold ~ wg - 1 + (1 | subject) 
##    Data: d_dispthresh (Number of observations: 186) 
## Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 20000
## 
## Group-Level Effects: 
## ~subject (Number of levels: 12) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.39      0.11     0.23     0.65 1.00     3320     6391
## 
## Population-Level Effects: 
##        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## wgCM      -0.22      0.17    -0.54     0.12 1.00     3204     5211
## wgCMM     -1.15      0.17    -1.47    -0.81 1.00     3193     4952
## wgP2      -0.29      0.17    -0.62     0.05 1.00     3098     4734
## wgP3      -0.69      0.17    -1.00    -0.35 1.00     3030     5036
## wgP31M    -1.18      0.16    -1.49    -0.85 1.00     3190     4791
## wgP3M1    -1.35      0.16    -1.66    -1.02 1.00     3103     4735
## wgP4      -0.89      0.17    -1.20    -0.55 1.00     3162     5310
## wgP4G     -1.22      0.17    -1.54    -0.88 1.00     3297     5490
## wgP4M     -1.29      0.17    -1.61    -0.95 1.00     3063     5266
## wgP6      -1.20      0.17    -1.53    -0.86 1.00     3168     4805
## wgP6M     -1.42      0.17    -1.74    -1.08 1.00     3269     4816
## wgPG       0.36      0.17     0.04     0.70 1.00     3343     5404
## wgPGG     -0.31      0.16    -0.62     0.03 1.00     3238     4975
## wgPM      -0.79      0.17    -1.11    -0.44 1.00     3190     5267
## wgPMG     -0.96      0.17    -1.28    -0.63 1.00     3201     5025
## wgPMM     -1.17      0.17    -1.49    -0.84 1.00     3209     5332
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.42      0.02     0.37     0.47 1.00    13509    13539
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.3: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

2.2.3 EEG Control Data

We will also fit models to the control data. As we can see from Figure 2.4, the group differences are much smaller.

The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.4: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

3 Subgroup Comparisons

We will now compute the difference between sub- and super-groups.

3.1 EEG RMS

Finally, I calculate the probability that the difference between subgroup and supergroup is larger than zero (small than zero for the display duration) given the data. This information is then binned so we can colour in the posterior density plots.

Now we will use it!

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.1: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

3.2 Display Duration Threshold

We can do the same for the display duration thresholds.

Distributions of the difference in mean log display duration threshold between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being less than zero.

Figure 3.2: Distributions of the difference in mean log display duration threshold between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being less than zero.

3.3 Control Data

We will now do exactly the same with the control data…

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.3: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.4: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

3.4 Summary

We can summarise the subgroup comparison plots above by plotting ROC curves for each of our four measurements (Figure 3.5).

This figure shows how many of our 64 comparisons are classed as having a greater-than-zero difference (less-than-zero for the display durations) for difference thresholds. between 0.5 an 1.0.

Figure 3.5: This figure shows how many of our 64 comparisons are classed as having a greater-than-zero difference (less-than-zero for the display durations) for difference thresholds. between 0.5 an 1.0.

## # A tibble: 1 x 5
##     eeg threshold occ_even par_odd     p
##   <dbl>     <dbl>    <dbl>   <dbl> <dbl>
## 1    56        49       32      22  0.95

4 Additional Analysis

4.1 Index and Normality

Subgroup relations can be classified by their index, and wheter they are normal or not. Here we investigate the extent to which these two variables can account for the variation between the subgrop relationships.

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_value ~ as.numeric(as.character(index)) + normal 
##    Data: comp_summary$eeg (Number of observations: 64) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                       0.05      0.09    -0.13     0.24 1.00
## as.numericas.characterindex     0.04      0.01     0.02     0.07 1.00
## normal                          0.14      0.07     0.00     0.27 1.00
##                             Bulk_ESS Tail_ESS
## Intercept                       2913     2223
## as.numericas.characterindex     3138     2878
## normal                          3132     2656
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.19      0.02     0.16     0.23 1.00     3124     2383
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_value ~ as.numeric(as.character(index)) + normal 
##    Data: comp_summary$threshold (Number of observations: 64) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                      -0.20      0.17    -0.54     0.14 1.00
## as.numericas.characterindex    -0.05      0.03    -0.10     0.00 1.00
## normal                         -0.02      0.12    -0.27     0.22 1.00
##                             Bulk_ESS Tail_ESS
## Intercept                       2843     2136
## as.numericas.characterindex     3022     2509
## normal                          2979     2791
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.35      0.03     0.29     0.42 1.00     3553     2570
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

4.2 Correlation Between EEG RMS and Display Duration Threshold

Finally, we will investigate whether there is a correlation between the EEG rms measures and our display duration thresholds. As our two different measures come from different samples of participants, we are unable to do a direct comparison. However, we can use the results of the models discussed in Section 3 and check for a correlation between the predicted .

##     Estimate  Est.Error       Q2.5     Q97.5
## R2 0.1613049 0.07449397 0.02652427 0.3098456

We can see that although the correlation is relatively weak, our confidence interval indicates that we can be reasonably positive that \(R^2>0\) (i.e, 95% credible interval is 0.026 - 0.302).

Scatter plot showing the correlation between our two measures. Each line is a sample from the posterior of a Bayesian linear regression.

Figure 4.1: Scatter plot showing the correlation between our two measures. Each line is a sample from the posterior of a Bayesian linear regression.